
Researchers from the Aerospace Information Research Institute of the Chinese Academy of Sciences, in collaboration with Chongqing University of Posts and Telecommunications, have developed a high-resolution daily atmospheric carbon dioxide (CO2) dataset covering China from 2016 to 2020. The dataset offers new insights into the spatiotemporal variations of column-averaged dry-air CO2 mole fraction (XCO2).
To construct the dataset, the research team employed a novel XGBoost–Bayesian Optimization (XGBoost-BO) framework. This approach addresses key limitations of existing satellite-based carbon monitoring systems, including spatial imbalance, temporal discontinuity, and sensitivity to meteorological conditions. To enhance model interpretability, the team integrated SHapley Additive exPlanations (SHAP), which enables quantitative assessment of the relative contributions of climate factors, vegetation dynamics, and human activities to XCO2 variability.
With a spatial resolution of 0.1° × 0.1°, the dataset provides continuous daily XCO2 coverage across China, filling gaps in satellite-retrieved observations. It incorporates multi-source data, including satellite measurements from OCO-2 and GOSAT, ground-based observations from TCCON, vegetation indices (NDVI, EVI), meteorological variables from ERA5-Land, anthropogenic emissions data (ODIAC), nighttime light data from VIIRS, and global fire emissions data (GFED). This integration ensures comprehensive representation of both natural and anthropogenic CO2 sources and sinks.
Validation results confirm the dataset's high accuracy and reliability. Comparisons with OCO-2 observations yield an R2 of 0.98, a Root Mean Square Error (RMSE) of 0.58, and a Mean Absolute Percentage Error (MAPE) of 0.07%. Independent evaluation using TCCON observations at the Hefei and Xianghe sites further demonstrates its performance, consistently outperforming the CAMS global greenhouse gas reanalysis. For instance, at the Hefei site, the dataset achieves an R2 of 0.92, an RMSE of 1.16, and a MAPE of 0.2%, compared with the corresponding CAMS values of 0.88, 1.39, and 0.3%.
By providing high-precision, continuous XCO2 data, the dataset facilitates more accurate characterization of carbon cycle processes and supports evidence-based environmental governance.
The study was recently published in Scientific Data. This work was supported by the National Key Research and Development Program of China.
86-10-68597521 (day)
86-10-68597289 (night)
52 Sanlihe Rd., Xicheng District,
Beijing, China (100864)